SUPERVISED LEARNING

Random Forests

Model 1

The following plots and results are generated from a basic random forest model, using all predictors on the training data.

## 
## Call:
##  randomForest(formula = CARAVAN ~ ., data = training) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 9
## 
##         OOB estimate of  error rate: 6.73%
## Confusion matrix:
##      nope yes class.error
## nope 5421  52 0.009501188
## yes   340   8 0.977011494
##  [1] "call"            "type"            "predicted"      
##  [4] "err.rate"        "confusion"       "votes"          
##  [7] "oob.times"       "classes"         "importance"     
## [10] "importanceSD"    "localImportance" "proximity"      
## [13] "ntree"           "mtry"            "forest"         
## [16] "y"               "test"            "inbag"          
## [19] "terms"
## [1] "mean(rf_model$oob.times/rf_model$ntree"
## [1] 0.3679615
## [1] 0.3678794

##          MeanDecreaseGini
## MOSTYPE      19.290540490
## MAANTHUI      2.388730467
## MGEMOMV       5.886996004
## MGEMLEEF      6.896790658
## MOSHOOFD     11.507867142
## MGODRK        6.235641798
## MGODPR       10.873606174
## MGODOV        8.801322521
## MGODGE       11.755542695
## MRELGE        8.924406839
## MRELSA        5.637167130
## MRELOV        8.150525562
## MFALLEEN      8.461649101
## MFGEKIND     10.837577942
## MFWEKIND     11.523120778
## MOPLHOOG     10.404821207
## MOPLMIDD     11.696380075
## MOPLLAAG     11.328675850
## MBERHOOG      9.737253478
## MBERZELF      5.742246103
## MBERBOER      4.031571715
## MBERMIDD     11.600341220
## MBERARBG     10.074542579
## MBERARBO     10.458507053
## MSKA          8.921620325
## MSKB1         9.587316068
## MSKB2         9.800834909
## MSKC         10.728741842
## MSKD          6.429987007
## MHHUUR       10.063712233
## MHKOOP        9.913822851
## MAUT1         8.233167623
## MAUT2         7.736926570
## MAUT0         7.769931354
## MZFONDS       8.531803901
## MZPART        9.156031783
## MINKM30       8.976436772
## MINK3045     10.730039697
## MINK4575     10.034603200
## MINK7512      8.623682580
## MINK123M      2.982119768
## MINKGEM       8.805354253
## MKOOPKLA     12.844685412
## PWAPART      11.256612352
## PWABEDR       1.310853763
## PWALAND       0.326422759
## PPERSAUT     18.034250797
## PBESAUT       0.757110775
## PMOTSCO       3.397750901
## PVRAAUT       0.002807535
## PAANHANG      1.427554422
## PTRACTOR      1.573676725
## PWERKT        0.033542756
## PBROM         3.228235358
## PLEVEN        4.859588962
## PPERSONG      0.125616097
## PGEZONG       1.819836076
## PWAOREG       1.589095053
## PBRAND       19.378158602
## PZEILPL       0.244184310
## PPLEZIER      5.689015941
## PFIETS        3.062686807
## PINBOED       1.197211207
## PBYSTAND      3.952070788
## AWAPART       7.897881479
## AWABEDR       0.896055314
## AWALAND       0.294738376
## APERSAUT     16.785504217
## ABESAUT       0.732845203
## AMOTSCO       3.590657479
## AVRAAUT       0.015673016
## AAANHANG      1.414617318
## ATRACTOR      0.950283089
## AWERKT        0.025350459
## ABROM         2.271710248
## ALEVEN        6.192875806
## APERSONG      0.168557718
## AGEZONG       1.468139436
## AWAOREG       1.677082308
## ABRAND        7.926989124
## AZEILPL       0.279083880
## APLEZIER      5.588107872
## AFIETS        4.109556053
## AINBOED       1.153145837
## ABYSTAND      3.465965014

Results from the predictions on the testing data are below.

##       rf_predict
##        nope  yes
##   nope 3729   32
##   yes   237    1
## [1] "test-error= 0.0680170042510628"

In comparison with the other models, this particular version of the random forest actually classifies variables as “yes.” A general problem with many of the other models is that, although they have better testing error rates, they never classify testing data as a “yes” outcome. In other words, the false positive rate is nonexistant, but the falst negative result is relatively high.

Model 2

This model uses the RFcaret package and a repeated cv to adjust the random forest model.

## Random Forest 
## 
## 5822 samples
##   85 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 4 times) 
## Summary of sample sizes: 4657, 4657, 4658, 4658, 4658, 4658, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE       Rsquared    MAE      
##    1    0.2328000  0.04598831  0.1098930
##    2    0.2330799  0.03550816  0.1083869
##    3    0.2353624  0.03112684  0.1084650
##    4    0.2366904  0.03021162  0.1085802
##    5    0.2372925  0.03087102  0.1086585
##    6    0.2377683  0.03117739  0.1088583
##    7    0.2380816  0.03187034  0.1089887
##    8    0.2381443  0.03295400  0.1091076
##    9    0.2382825  0.03371327  0.1092386
##   10    0.2385889  0.03364026  0.1094089
##   11    0.2388074  0.03387505  0.1094490
##   12    0.2391214  0.03391453  0.1096743
##   13    0.2391256  0.03452624  0.1096910
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 1.

##  [1] "method"       "modelInfo"    "modelType"    "results"     
##  [5] "pred"         "bestTune"     "call"         "dots"        
##  [9] "metric"       "control"      "finalModel"   "preProcess"  
## [13] "trainingData" "resample"     "resampledCM"  "perfNames"   
## [17] "maximize"     "yLimits"      "times"        "levels"
##    mtry      RMSE   Rsquared       MAE      RMSESD RsquaredSD       MAESD
## 1     1 0.2328000 0.04598831 0.1098930 0.011101017 0.01508267 0.004102595
## 2     2 0.2330799 0.03550816 0.1083869 0.010401804 0.01254881 0.004249404
## 3     3 0.2353624 0.03112684 0.1084650 0.009810852 0.01128211 0.004352661
## 4     4 0.2366904 0.03021162 0.1085802 0.009633047 0.01098039 0.004456811
## 5     5 0.2372925 0.03087102 0.1086585 0.009615064 0.01146871 0.004565664
## 6     6 0.2377683 0.03117739 0.1088583 0.009453645 0.01170839 0.004600833
## 7     7 0.2380816 0.03187034 0.1089887 0.009494438 0.01246559 0.004768156
## 8     8 0.2381443 0.03295400 0.1091076 0.009577769 0.01242227 0.004828244
## 9     9 0.2382825 0.03371327 0.1092386 0.009748927 0.01255698 0.004784547
## 10   10 0.2385889 0.03364026 0.1094089 0.009614049 0.01283911 0.005035203
## 11   11 0.2388074 0.03387505 0.1094490 0.009595494 0.01292354 0.004970173
## 12   12 0.2391214 0.03391453 0.1096743 0.009720219 0.01347034 0.005040520
## 13   13 0.2391256 0.03452624 0.1096910 0.009787493 0.01345622 0.005023032
## [1] 1
## 
## Call:
##  randomForest(x = x, y = y, ntree = 500, mtry = param$mtry) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 0.05429577
##                     % Var explained: 3.39
##          IncNodePurity
## MOSTYPE    0.743560330
## MAANTHUI   0.166974215
## MGEMOMV    0.391050207
## MGEMLEEF   0.352890420
## MOSHOOFD   0.710967602
## MGODRK     0.297801894
## MGODPR     0.535078567
## MGODOV     0.430770941
## MGODGE     0.577898760
## MRELGE     0.449593568
## MRELSA     0.340270561
## MRELOV     0.430620689
## MFALLEEN   0.435395010
## MFGEKIND   0.497923465
## MFWEKIND   0.551837314
## MOPLHOOG   0.660334862
## MOPLMIDD   0.634607120
## MOPLLAAG   0.815324589
## MBERHOOG   0.572577109
## MBERZELF   0.344828367
## MBERBOER   0.301749635
## MBERMIDD   0.632354343
## MBERARBG   0.644368507
## MBERARBO   0.491811274
## MSKA       0.518556366
## MSKB1      0.490673633
## MSKB2      0.482227654
## MSKC       0.569997722
## MSKD       0.379726342
## MHHUUR     0.549356984
## MHKOOP     0.660330637
## MAUT1      0.488537783
## MAUT2      0.381777996
## MAUT0      0.487725256
## MZFONDS    0.545168105
## MZPART     0.504422297
## MINKM30    0.558929253
## MINK3045   0.542194880
## MINK4575   0.594042967
## MINK7512   0.507489702
## MINK123M   0.192683494
## MINKGEM    0.595335181
## MKOOPKLA   0.711756595
## PWAPART    0.610017974
## PWABEDR    0.125175523
## PWALAND    0.053535567
## PPERSAUT   1.275149806
## PBESAUT    0.068513317
## PMOTSCO    0.198154608
## PVRAAUT    0.003553985
## PAANHANG   0.118903719
## PTRACTOR   0.144823831
## PWERKT     0.009248035
## PBROM      0.194247172
## PLEVEN     0.229260885
## PPERSONG   0.025697951
## PGEZONG    0.234168957
## PWAOREG    0.165447428
## PBRAND     0.897479938
## PZEILPL    0.051455561
## PPLEZIER   0.686684753
## PFIETS     0.195553855
## PINBOED    0.131866004
## PBYSTAND   0.306637901
## AWAPART    0.516916255
## AWABEDR    0.101885921
## AWALAND    0.054104476
## APERSAUT   1.129552998
## ABESAUT    0.043628080
## AMOTSCO    0.159442944
## AVRAAUT    0.004313869
## AAANHANG   0.136451027
## ATRACTOR   0.089428773
## AWERKT     0.007244795
## ABROM      0.143473235
## ALEVEN     0.360119326
## APERSONG   0.023338078
## AGEZONG    0.127028965
## AWAOREG    0.189216313
## ABRAND     0.425363522
## AZEILPL    0.056982155
## APLEZIER   0.599573585
## AFIETS     0.272364777
## AINBOED    0.118262845
## ABYSTAND   0.301024337

Results from the predictions on the testing data are below. This model experiences the false negative problem, although the testing error is improved.

##       RFcaret_predict
##           0
##   nope 3761
##   yes   238

Model 3

## Random Forest 
## 
## 5821 samples
##   85 predictor
##    2 classes: 'nope', 'yes' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 4 times) 
## Summary of sample sizes: 4657, 4656, 4657, 4657, 4657, 4658, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##    2    0.9402166  0.00000000
##   43    0.9247559  0.05257694
##   85    0.9214491  0.04739453
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.

##  [1] "method"       "modelInfo"    "modelType"    "results"     
##  [5] "pred"         "bestTune"     "call"         "dots"        
##  [9] "metric"       "control"      "finalModel"   "preProcess"  
## [13] "trainingData" "resample"     "resampledCM"  "perfNames"   
## [17] "maximize"     "yLimits"      "times"        "levels"      
## [21] "terms"        "coefnames"    "xlevels"
##   mtry  Accuracy      Kappa   AccuracySD    KappaSD
## 1    2 0.9402166 0.00000000 0.0004131887 0.00000000
## 2   43 0.9247559 0.05257694 0.0036133103 0.02620041
## 3   85 0.9214491 0.04739453 0.0041394355 0.02852629
## [1] 2
## 
## Call:
##  randomForest(x = x, y = y, ntree = 200, mtry = param$mtry) 
##                Type of random forest: classification
##                      Number of trees: 200
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 5.98%
## Confusion matrix:
##      nope yes class.error
## nope 5473   0           0
## yes   348   0           1
##          MeanDecreaseGini
## MOSTYPE       7.460994278
## MAANTHUI      1.367430411
## MGEMOMV       3.441084340
## MGEMLEEF      3.865874161
## MOSHOOFD      5.732377383
## MGODRK        3.111255695
## MGODPR        5.237797377
## MGODOV        3.927947040
## MGODGE        5.158741750
## MRELGE        4.468034659
## MRELSA        2.915211775
## MRELOV        3.910238797
## MFALLEEN      4.312445020
## MFGEKIND      4.844589245
## MFWEKIND      5.525781607
## MOPLHOOG      4.968570133
## MOPLMIDD      5.236016673
## MOPLLAAG      5.534982268
## MBERHOOG      4.777145078
## MBERZELF      2.673033018
## MBERBOER      2.409578543
## MBERMIDD      5.892645710
## MBERARBG      5.099924451
## MBERARBO      4.779756788
## MSKA          4.757661369
## MSKB1         4.885326247
## MSKB2         4.679449883
## MSKC          5.324309212
## MSKD          3.642837757
## MHHUUR        5.416461357
## MHKOOP        5.033022913
## MAUT1         4.729654171
## MAUT2         3.590993827
## MAUT0         3.918990055
## MZFONDS       4.887238960
## MZPART        4.683533914
## MINKM30       4.566165311
## MINK3045      5.241399678
## MINK4575      5.030265239
## MINK7512      4.427956224
## MINK123M      1.734264273
## MINKGEM       4.701371683
## MKOOPKLA      6.063312529
## PWAPART       4.331455165
## PWABEDR       0.535696163
## PWALAND       0.320811176
## PPERSAUT      8.440227748
## PBESAUT       0.294589525
## PMOTSCO       1.449449924
## PVRAAUT       0.006277594
## PAANHANG      0.960757972
## PTRACTOR      1.010132035
## PWERKT        0.017808339
## PBROM         1.052973276
## PLEVEN        2.117389143
## PPERSONG      0.082290395
## PGEZONG       0.996478594
## PWAOREG       0.614634330
## PBRAND        7.582739567
## PZEILPL       0.195487031
## PPLEZIER      3.324950213
## PFIETS        1.199759095
## PINBOED       0.629966191
## PBYSTAND      2.126018970
## AWAPART       3.367936734
## AWABEDR       0.442986635
## AWALAND       0.257595109
## APERSAUT      7.584040013
## ABESAUT       0.339407427
## AMOTSCO       1.209254816
## AVRAAUT       0.007369495
## AAANHANG      0.770375556
## ATRACTOR      0.577837250
## AWERKT        0.040560219
## ABROM         0.807628570
## ALEVEN        2.573087573
## APERSONG      0.152421861
## AGEZONG       0.532863602
## AWAOREG       0.906036760
## ABRAND        2.963367019
## AZEILPL       0.221192549
## APLEZIER      3.342759305
## AFIETS        1.791309078
## AINBOED       0.625594376
## ABYSTAND      1.877529787

Results from the predictions on the testing data are below.

##       RFcaret_predict2
##        nope  yes
##   nope 3761    0
##   yes   238    0
## [1] "test-error= 0.0595148787196799"

Boosting

Model 1

This model uses gbm boosting.

## Random Forest 
## 
## 5822 samples
##   85 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 4 times) 
## Summary of sample sizes: 4657, 4657, 4658, 4658, 4658, 4658, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE       Rsquared    MAE      
##    1    0.2328000  0.04598831  0.1098930
##    2    0.2330799  0.03550816  0.1083869
##    3    0.2353624  0.03112684  0.1084650
##    4    0.2366904  0.03021162  0.1085802
##    5    0.2372925  0.03087102  0.1086585
##    6    0.2377683  0.03117739  0.1088583
##    7    0.2380816  0.03187034  0.1089887
##    8    0.2381443  0.03295400  0.1091076
##    9    0.2382825  0.03371327  0.1092386
##   10    0.2385889  0.03364026  0.1094089
##   11    0.2388074  0.03387505  0.1094490
##   12    0.2391214  0.03391453  0.1096743
##   13    0.2391256  0.03452624  0.1096910
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 1.

Results from the predictions on the testing data are below. The testing error is the same as the RFcaret random forest model.

##       boosting_predict
##           0
##   nope 3761
##   yes   238
## Accuracy    Kappa 
##       NA       NA
## [1] "test-error= 0.0595148787196799"
## Area under the curve: 0.5

Model 2

This model uses xgboost to perform boosting.

## [1]  train-error:0.057207 
## [2]  train-error:0.057550 
## [3]  train-error:0.057378 
## [4]  train-error:0.056863 
## [5]  train-error:0.057207

Results from the predictions on the testing data are below. The testing error of this model is higher, but it does make predictions in the “yes” category. The testing error is improved over the original random forest model (model 1).

## [1] "Mean relative difference: 12.28571"
##    xgboost_predict
##        0    1
##   0 3740   21
##   1  237    1
## [1] "test-error= 0.0645161290322581"

SVMs

Model 1

This model uses linear support vector machines for classification.

## 
## Call:
## svm(formula = CARAVAN ~ ., data = training, kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1 
##       gamma:  0.01176471 
## 
## Number of Support Vectors:  2199
## 
## Call:
## svm(formula = CARAVAN ~ ., data = training, kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1 
##       gamma:  0.01176471 
## 
## Number of Support Vectors:  2199
## 
##  ( 1851 348 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  nope yes
##       pred.train
##        nope  yes
##   nope 5473    0
##   yes   348    0
## [1] "train-error= 0.0597835423466758"

Results from predictions on the testing data are below. The testing error is the same as the training error and the other models.

##       pred.test
##        nope  yes
##   nope 3761    0
##   yes   238    0
## [1] "test-error= 0.0595148787196799"

Adjustments to the model were made with tuning, with model summaries displayed.

##         cost      error  dispersion
## 1  3.125e-02 0.05995267 0.009845951
## 2  6.250e-02 0.06012449 0.009797777
## 3  1.250e-01 0.06012449 0.009797777
## 4  2.500e-01 0.06029631 0.009745999
## 5  5.000e-01 0.06012449 0.009797777
## 6  1.000e+00 0.06012449 0.009797777
## 7  2.000e+00 0.06012449 0.009797777
## 8  4.000e+00 0.06029631 0.009745999
## 9  8.000e+00 0.06029631 0.009745999
## 10 1.600e+01 0.06012449 0.009797777
## 11 3.200e+01 0.06012449 0.009797777
## 12 6.400e+01 0.06012449 0.009797777
## 13 1.280e+02 0.05995267 0.009845951
## 14 2.560e+02 0.05995267 0.009845951
## 15 5.120e+02 0.05995267 0.009845951
## 16 1.024e+03 0.05995267 0.009845951
## 
## Call:
## best.tune(method = svm, train.x = CARAVAN ~ ., data = training, 
##     ranges = list(cost = 2^(-5:10)), kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.03125 
##       gamma:  0.01176471 
## 
## Number of Support Vectors:  1535
##       
##        nope  yes
##   nope 5473    0
##   yes   348    0
##       tune_prediction
##        nope  yes
##   nope 3761    0
##   yes   238    0
## [1] "test-error= 0.0595148787196799"

The testing error does not improve with tuning.

Model 2

The following model uses polynomial SVM.

## 
## Call:
## svm(formula = CARAVAN ~ ., data = training, kernel = "polynomial", 
##     degree = 1, gamma = 1, coef0 = 0)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  1 
##       gamma:  1 
##      coef.0:  0 
## 
## Number of Support Vectors:  2199
## 
## Call:
## svm(formula = CARAVAN ~ ., data = training, kernel = "polynomial", 
##     degree = 1, gamma = 1, coef0 = 0)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  1 
##       gamma:  1 
##      coef.0:  0 
## 
## Number of Support Vectors:  2199
## 
##  ( 1851 348 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  nope yes
##       pred.train
##        nope  yes
##   nope 5473    0
##   yes   348    0
## [1] "train-error= 0.0597835423466758"

Results from predictions on the testing data are below. The testing error is not improved from previous models.

##       pred.test
##        nope  yes
##   nope 3761    0
##   yes   238    0
## [1] "test-error= 0.0595148787196799"

The following results are from tuning on the polynomial SVM.

##        cost      error dispersion
## 1   0.03125 0.05995915 0.01305886
## 2   0.06250 0.05995915 0.01305886
## 3   0.12500 0.06013097 0.01307276
## 4   0.25000 0.06013097 0.01307276
## 5   0.50000 0.06013097 0.01307276
## 6   1.00000 0.06013097 0.01307276
## 7   2.00000 0.06047462 0.01331658
## 8   4.00000 0.06047462 0.01331658
## 9   8.00000 0.06047462 0.01331658
## 10 16.00000 0.06013097 0.01307276
## 11 32.00000 0.06030279 0.01318404
## 
## Call:
## best.tune(method = svm, train.x = CARAVAN ~ ., data = training, 
##     ranges = list(cost = 2^(-5:5)), kernel = "polynomial", degree = 1:3, 
##     gamma = 1, coef0 = 1)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  0.03125 
##      degree:  1 2 3 
##       gamma:  1 
##      coef.0:  1 
## 
## Number of Support Vectors:  1503
##       
##        nope  yes
##   nope 5473    0
##   yes   348    0

The following results are from predictions on the testing data. There is no improvement in the testing error.

##       pred.test
##        nope  yes
##   nope 3761    0
##   yes   238    0
## [1] "test-error= 0.0595148787196799"

The following model uses a radial SVM, with tuning.

##       predsvm5.train
##        nope  yes
##   nope 5473    0
##   yes   331   17
## [1] "train-error= 0.056863081944683"
## 
## Call:
## best.tune(method = svm, train.x = CARAVAN ~ ., data = training, 
##     ranges = list(cost = 2^(-5:5), gamma = 2^(-5:0)), kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.03125 
##       gamma:  0.03125 
## 
## Number of Support Vectors:  2010
##       
##        nope  yes
##   nope 5473    0
##   yes   348    0
## [1] "tune-error= 0.0597835423466758"

Both the tuned and untuned models do not make improvements to the testing error.

##       predsvm5.test
##        nope  yes
##   nope 3761    0
##   yes   238    0
##       predsvm5.tunetest
##        nope  yes
##   nope 3761    0
##   yes   238    0
## [1] "test-error= 0.0595148787196799"
## [1] "tuned test-error= 0.0595148787196799"

UNSUPERVISED LEARNING

K-means clustering

The following model uses K-means clustering. Scaling leads to significantly different results.

Representitive plots are displayed below, using the first two variables in the training data for both scaled and unscaled data.

## List of 9
##  $ cluster     : int [1:5821] 2 3 1 1 3 3 2 3 2 1 ...
##  $ centers     : num [1:3, 1:36] 1.046 0.468 0.645 4.665 5.057 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:3] "1" "2" "3"
##   .. ..$ : chr [1:36] "MGODRK" "MGODPR" "MGODOV" "MGODGE" ...
##  $ totss       : num 636318
##  $ withinss    : num [1:3] 139945 147064 171156
##  $ tot.withinss: num 458164
##  $ betweenss   : num 178154
##  $ size        : int [1:3] 1668 2072 2081
##  $ iter        : int 4
##  $ ifault      : int 0
##  - attr(*, "class")= chr "kmeans"
## List of 9
##  $ cluster     : int [1:5821] 1 2 1 1 2 3 3 3 3 1 ...
##  $ centers     : num [1:3, 1:36] 0.2856 0.1014 -0.3801 0.0472 -0.3203 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:3] "1" "2" "3"
##   .. ..$ : chr [1:36] "MGODRK" "MGODPR" "MGODOV" "MGODGE" ...
##  $ totss       : num 209520
##  $ withinss    : num [1:3] 72559 40655 53492
##  $ tot.withinss: num 166706
##  $ betweenss   : num 42814
##  $ size        : int [1:3] 2201 1552 2068
##  $ iter        : int 3
##  $ ifault      : int 0
##  - attr(*, "class")= chr "kmeans"

Hierarchical Clustering

The following plots display hierarchical clustering on the scaled data.

## Class 'dist'  atomic [1:16939110] 7.27 6.28 11.65 8.43 5.04 ...
##   ..- attr(*, "Size")= int 5821
##   ..- attr(*, "Diag")= logi FALSE
##   ..- attr(*, "Upper")= logi FALSE
##   ..- attr(*, "method")= chr "euclidean"
##   ..- attr(*, "call")= language dist(x = training_u2)
## cluster1
##    1    2    3    4    5 
## 4678  703  403   35    2
## cluster2
##    1    2    3    4    5 
## 4678  703  403   35    2
##         cluster2
## cluster1    1    2    3    4    5
##        1 4678    0    0    0    0
##        2    0  703    0    0    0
##        3    0    0  403    0    0
##        4    0    0    0   35    0
##        5    0    0    0    0    2

The following plots display hierarchical clustering on the unscaled data.

MDS Plots

MDS plots were generated using cmdscale. Plots are displayed for both scaled and unscaled data.

## [1] 5821   36
## [1] 2.431997e-15 6.345407e-16